A Cost Model for Similarity

نویسنده

  • Paolo Ciaccia
چکیده

We consider the problem of estimating CPU (distance computations) and I/O costs for processing range and k-nearest neighbors queries over metric spaces. Unlike the speciic case of vector spaces, where information on data distribution has been exploited to derive cost models for predicting the performance of multi-dimensional access methods, in a generic metric space there is no such a possibility, which makes the problem quite diierent and requires a novel approach. We insist that the distance distribution of objects can be profitably used to solve the problem, and consequently develop a concrete cost model for the M-tree access method 10]. Our results rely on the assumption that the indexed dataset comes from a metric space which is \homogeneous" enough (in a probabilistic sense) to allow reliable cost estimations even if the distance distribution with respect to a speciic query object is unknown. We experimentally validate the model over both real and synthetic datasets, and show how the model can be used to tune the M-tree in order to minimize a combination of CPU and I/O costs. Finally, we sketch how the same approach can be applied to derive a cost model for the vp-tree index structure 8].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Waste Collection Vehicle Routing Problem Considering Similarity Pattern of Trashcan

Collection of waste is an important logistic activity within any city. In this study, a mathematical model is proposed in order to reduce the cost of waste collection. First a mixed-integer nonlinear programming model is provided including a waste collection routing problem, so that, there is a balance between the distance between trashcans, and the similarity of the trashcans in terms of th...

متن کامل

تأثیر مدل‌سازی تراک1 در محاسبه خواص موج بلاست ناشی از انفجار ابر هوا- سوخت

In this work the effect of the accuracy of a FAE detonation modeling on the generated blast wave is investigated. First, a one-dimensional numerical simulation with a reduced chemical kinetics of C2H2-O2-Ar, involving 25 elementary reactions, is used as the base model. The properties of the blast calculated with this model is compared with those of simpler models, the similarity solution of Tay...

متن کامل

A Novel Method for Tracking Moving Objects using Block-Based Similarity

Extracting and tracking active objects are two major issues in surveillance and monitoring applications such as nuclear reactors, mine security, and traffic controllers. In this paper, a block-based similarity algorithm is proposed in order to detect and track objects in the successive frames. We define similarity and cost functions based on the features of the blocks, leading to less computati...

متن کامل

Evaluation of wheat genotypes under tillage practices: application of technique for order preference by similarity to ideal solution method

Adoption of conservative agriculture at farm level is associated with reducing the production costs and leads to crop yield stability. The aim of this study was to prioritize experimental treatments based on different criteria by applying "technique for order preference by similarity to ideal solution" (TOPSIS).A filed experiment was carried out at Zarghan research station, Fars province, Iran,...

متن کامل

Dimensional Similarity in the Study of Microbubble Production Inside Venturi Tube

The present study considers of the water and air flow and Micro-Bubble production inside the venturi tube, by the use of dimensional analysis. Numerical analysis of Micro-Bubble creation in venturi tube requires fast computers and large amounts of storage space. Up to now, there has been no numerical analysis concerning Micro-Bubble creation and all other existing studies are experimental. To s...

متن کامل

A Novel Architecture for Detecting Phishing Webpages using Cost-based Feature Selection

Phishing is one of the luring techniques used to exploit personal information. A phishing webpage detection system (PWDS) extracts features to determine whether it is a phishing webpage or not. Selecting appropriate features improves the performance of PWDS. Performance criteria are detection accuracy and system response time. The major time consumed by PWDS arises from feature extraction that ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998